[1] 104 622
Week 2
Old Dominion University
What would be a good forecast, and why?
What is the “goal” or “objective”?
Does visualizing the data change your mind?
Point Forecast: a best guess (\(\hat{Y}\)) for an unknown value.
Forecast Distribution: a distribution, or range, of possibilities and their corresponding likelihoods/probabilities.
Point Forecast (again): a summary statistic of a forecast distribution
\[ e = y - \hat{y} \]
Will there always be forecast error?
So long as \(y\) is not deterministic, yes!
Of course, some errors are worse than others.
\[ L(e): \text{Loss Function} \]
What are some examples of Loss Functions?
Absolute Loss
\[L(e) = |e| \]
Quadratic Loss
\[L(e) = e^2 \]
What are some features (good / bad) about these Loss Functions?
Forecasts seek to minimize risk. Risk is defined as the expected loss
\[R(\hat{y}) = E[L(e)] = E[L(y - \hat{y})]\]
For quadratic loss:
\[ E[(y - \hat{y})^2] = E[y^2] - 2\hat{y}E[y] + \hat{y}^2\]
Remember, we know \(E[y]\). From here, we can set to minimize this function with respect to \(\hat{y}\).
\[E[y^2] - 2\hat{y}E[y] + \hat{y}^2 \implies 0 = 2E[y] - 2\hat{y}\]
After simplifying, we’ll have \(\hat{y} = E[y]\). So, the \(\hat{y}\) that minimizes our risk is the mean of the distribution of \(y\).
So, if \(\hat{y}^* = E[y]\), how do we calculate \(E[y]\)?
Continuous:
Discrete:
Empirical:
Suppose you flip a weighted coin with the following distribution:
\[P(y = T) = \frac{3}{7}\] \[P(y = H) = \frac{4}{7}\]
If the coin comes up tails, you are given $40. If the coin comes up heads, you owe $25.
What is the expected value of the coin flip? What is the expected “risk” (i.e. \(E[L(e)]\))?
Let’s use our shiny new optimal forecast.
This looks terrible, but let’s continue for now.
If our variable of interest is normally distributed, we can calculate a confidence interval for our predicted outcome.
\[ [\mu - \sigma Z_{\alpha/2}, \mu + \sigma Z_{\alpha/2}] \]
\[ [\hat{y} - \sigma Z_{\alpha/2}, \hat{y} + \sigma Z_{\alpha/2}] \]
\[ [280.3 - 120 * 1.96, 280.3 + 120 * 1.96] \]
\[ [45.1, 515.5] \]
air2 <- ts(mean(air), start = c(1961, 1), end = c(1962, 12), freq = 12)
air2_u <- air2 + (1.96*sd(air))
air2_l <- air2 - (1.96*sd(air))
plot(air, xlim = c(1949, 1963),
ylim = c(min(air, air2_l),
max(air, air2_u)))
lines(air2, col = "tomato", lwd = 2, lty = 2)
lines(air2_u, lty = 1, col = "tomato")
lines(air2_l, lty = 1, col = "tomato")library("lubridate")
timez <- as_date(ymd("1949-01-01"):ymd("1962-12-01")); head(timez, 3)
timez <- floor_date(timez, "month"); head(timez, 3)
timez <- unique(timez); head(timez, 3)
air3 <- data.frame(time = timez,
air = c(air, rep(NA, 24)))
head(air3, 3)
reg <- lm(air ~ 1, data = air3)
air3$pred <- predict(reg, air3)
air3$pred_l <- air3$pred - (1.96*sd(air3$air, na.rm = TRUE))
air3$pred_u <- air3$pred + (1.96*sd(air3$air, na.rm = TRUE))[1] "1949-01-01" "1949-01-02" "1949-01-03"
[1] "1949-01-01" "1949-01-01" "1949-01-01"
[1] "1949-01-01" "1949-02-01" "1949-03-01"
time air
1 1949-01-01 112
2 1949-02-01 118
3 1949-03-01 132
plot(air3$time, air3$air,
xlim = ymd(c("1949-01-01", "1963-01-01")),
ylim = c(min(air3$air, air3$pred_l, na.rm = TRUE),
max(air3$air, air3$pred_u, na.rm = TRUE)),
type = "l", xlab = "", ylab = "Frequency"
)
lim <- air3$time > ymd("1960-12-01")
lines(air3$time[lim], air3$pred[lim], col = "tomato", lty = 2)
lines(air3$time[lim], air3$pred_l[lim], col = "tomato")
lines(air3$time[lim], air3$pred_u[lim], col = "tomato")Everything we’ve talked about thus far has been as true for cross section as for time series data.
Let’s talk about time now.
Time, or rather periodicity, can come in many forms:
This is called frequency.
A common feature of time series data is intertemporal dependency. For this, we have lags and leads.
Lags: \(y_{t-1}\), \(y_{t-2}\), \(y_{t-k}\)
Leads: \(y_{t+1}\), \(y_{t+2}\), \(y_{t+k}\)
Sample: \(\{y_{1}, y_{2}, y_{3}, ..., y_{T}\}\)
Periods:
Sample: \(\{y_{1}, y_{2}, y_{3}, ..., y_{T}\}\)
Out of Sample: \(\{y_{T+1}, y_{T+2}, ..., y_{T+h}\}\) where \(h\) is the horizon
Notation:
\(\hat{y}\) : \(y\) \(\implies\) \(\hat{y_t}\) : \(y_t\)
\(\hat{y_t}\) : \(y_t\) \(\implies\) \(\hat{y_{T+h}}\) : \(y_{T+h}\)
However, these are all unclear.
\(\hat{y_{T+h|t}}\), \(y_{T+h|t}\)
Conditioning with relevant variables will improve forecast (i.e. reduce risk)
Both point forecasts and forecast intervals are functions of conditioning variables.
Let’s call all available information \(\Omega_t\).
par(mfrow = c(1, 2))
b <- read.csv("../data/bball_allstars.csv")
plot(density(b$HEIGHT),
xlim = c(65, 90), ylim = c(0, .12),
main = "Height of Pro Basketball Players", xlab = paste("Mean:", round(mean(b$HEIGHT), 1)))
abline(v = mean(b$HEIGHT))
plot(density(b$HEIGHT[b$LEAGUE == "NBA"]),
xlim = c(65, 90), ylim = c(0, .12),
main = "Height of Pro Basketball Players", xlab = paste("Mean:", round(mean(b$HEIGHT[b$LEAGUE == "WNBA"]), 1), "|", round(mean(b$HEIGHT[b$LEAGUE == "NBA"]), 1)),
col = "dodgerblue")
lines(density(b$HEIGHT[b$LEAGUE == "WNBA"]),
col = "tomato")
abline(v = mean(b$HEIGHT))
abline(v = mean(b$HEIGHT[b$LEAGUE == "NBA"]),
col = "dodgerblue", lty = 2)
abline(v = mean(b$HEIGHT[b$LEAGUE == "WNBA"]),
col = "tomato", lty = 2)
legend("topright",
legend = c("WNBA", "NBA"),
col = c("tomato", "dodgerblue"),
lty = 1, bty = "n")par(mfrow = c(1, 2))
b$letter <- substr(b$PLAYER, 1, 1) %in% letters[1:10]
plot(density(b$HEIGHT),
xlim = c(65, 90), ylim = c(0, .12),
main = "Height of Pro Basketball Players", xlab = paste("Mean:", round(mean(b$HEIGHT), 1)))
abline(v = mean(b$HEIGHT))
plot(density(b$HEIGHT[b$letter]),
xlim = c(65, 90), ylim = c(0, .12),
main = "Height of Pro Basketball Players", xlab = paste("Mean:", round(mean(b$HEIGHT[!b$letter]), 1), "|", round(mean(b$HEIGHT[b$letter]), 1)),
col = "dodgerblue")
lines(density(b$HEIGHT[!b$letter]),
col = "tomato")
abline(v = mean(b$HEIGHT))
abline(v = mean(b$HEIGHT[b$letter]),
col = "dodgerblue", lty = 2)
abline(v = mean(b$HEIGHT[!b$letter]),
col = "tomato", lty = 2)
legend("topright",
legend = c("First Name A:J", "First Name K:Z"),
col = c("tomato", "dodgerblue"),
lty = 1, bty = "n")Just like with cross sectional settings, we may (or may not) observe some additional variables. However, we still need to model the Data Generating Process (DGP).
Data are not a choice, per se, but you do choose functional forms and specifications.
Once we select a model, we need to estimate the model’s parameters with data.
\[y_t = f(y_{t-1}, x_{t}, x_{t-1}, \tau_t, C_t, S_t)\]
\[y_t = f(y_{t-1}, x_{t}, x_{t-1}, \tau_t, C_t, S_t)\]
\[y_t = \tau_t + C_t + S_t\]
Trend
Cycle
Season
It is useful to consider these things separately (additively)
The simplest models has an intercept, but no trend, cycle or season.
\[E[y_{t+h}|\Omega_t] = \beta_0\]
This is simple, but what type of setting would this be appropriate for?
plot(consume$DATE[consume$DATE < ymd("2020-01-01")],
consume$PCEC96_PC1[consume$DATE < ymd("2020-01-01")],
xlim = c(min(consume$DATE), max(consume$DATE)),
type = "l",
xlab = "Month",
ylab = "Percent Change (Year over Year)",
main = "Real Personal Consumption Expenditures (PCEC96)")
abline(h = 0, lty = 2)
abline(h = mean(consume$PCEC96_PC1[consume$DATE < ymd("2020-01-01")]),
col = "dodgerblue", lwd = 2)plot(consume$DATE[consume$DATE < ymd("2020-01-01")],
consume$PCEC96_PC1[consume$DATE < ymd("2020-01-01")],
xlim = c(min(consume$DATE), max(consume$DATE)),
type = "l",
xlab = "Month",
ylab = "Percent Change (Year over Year)",
main = "Real Personal Consumption Expenditures (PCEC96)")
abline(h = 0, lty = 2)
m <- consume$PCEC96_PC1[consume$DATE < ymd("2020-01-01")]
segments(x0 = ymd("2020-01-01"),
x1 = max(consume$DATE),
y0 = mean(m),
lty = 1, lwd = 2, col = "mediumseagreen")
segments(x0 = ymd("2020-01-01"),
x1 = max(consume$DATE),
y0 = mean(m) + 1.645*sd(m),
lty = 2, lwd = 2, col = "mediumseagreen")
segments(x0 = ymd("2020-01-01"),
x1 = max(consume$DATE),
y0 = mean(m) - 1.645*sd(m),
lty = 2, lwd = 2, col = "mediumseagreen")
legend("bottomright", horiz = T,
legend = c("Point Forecast", "90% CI"),
lty = 1:2, col = "mediumseagreen", bty = "n")Forecast Errors
\(e_t = y_{t+h} - E[y_{t+h}|\Omega_t] \implies y_{t+h} = E[y_{t+h}|\Omega_t] + e_t\)
Residuals
\(\hat{e_t} = y_{t+h} - \hat{y}_{t+h} = y_{t+h} - \beta_0\)
It may be helpful to plot the residuals over time to see if there are any remaining patterns.
Forecast Variance
Trend Models
ECON 707/807: Econometrics II